This page last changed on Oct 30, 2008 by straha1.

Table of Contents

Nodes

hpc has a total of 33 nodes of two types:

  • User node: hpc has one user node of the same type as the compute nodes below, but with 17 GB memory. This compute node is reached by logging in to hpc.rs.umbc.edu. This is the only node directly accessible from the outside network (outside of hpc). Users access all their files, edit code, compile jobs, and submit jobs to the scheduler from this node.
  • Compute nodes: hpc has presently 32 compute nodes, each with two dual-core AMD Opteron processors (2.6 GHz, 1024 kB cache), 13 GB memory, and a 170 GB local hard drive. These nodes are called node001, ..., node032 and can (in principle) be reached by "ssh node032" from the user node above. However, the compute nodes are solely for running code, and users are not allowed to connect to them for the purpose of running code. That is, interactive use of the compute nodes is not allowed; all jobs must be submitted to the scheduler on the user node above.

Network Hardware

Two networks connect all components of the system:

  • For communications among the processes of a parallel job, a high performance InfiniBand interconnect with low latency and wide bandwidth connects all nodes as well as connects to the main scratch storage on the IB storage (see below). This amounts to having a parallel file system available during computations.
  • A conventional Ethernet network connects all nodes and is used for operating system and other connections from the user node to the compute nodes that do not require high performance.

Storage

There are additional storage areas on the system, for instance for the operating system and other purposes, but three partitions are relevant from the user standpoint:

  • home directory – Each user has a home directory on the /home partition. This partition is 200 GB and its data is backed up by OIT. Since the partition is so small, users can only store 100 MB of data in their home directory.
  • infiniband-connected storage – This is a fast storage area that is not backed up at all – it is intended to be used as a temporary scratch space. The file servers that contain this data have some redundant hardware allowing them to survive a small number of hardware failures without loss of data. If too much hardware fails at once, then some data may be lost. Each research group is given space in this storage area, and it is accessible via ~/scratch (personal per-user scratch space) and ~/common/ (group shared scratch space).
  • UMBC Research Data Storage – This is a reliable storage area that can be accessed anywhere in UMBC, including from the HPC head node. Any UMBC researchers can purchase research data storage space. The file servers that make up this storage network each have redundant hardware. Thus no file server will suffer loss of data if there are minor hardware failures. If a file server has too many hardware failures at the same time, it will lose data. However, all data is stored twice – once in servers in the Public Policy building and once in servers in Engineering. Thus, even if an entire file server or entire server room fails, your data will not be lost. Also, we store ten daily archives of the data so if you accidentally delete data, we can undelete it for you. (The archives take very little space since the filesystem just stores the changes during each 24-hour period.)
  • AFS storage – Your AFS partition is the directory where your personal files are stored when you use the OIT computer labs or the gl.umbc.edu login nodes. The UMBC-wide /afs can be accessed on HPC via the /afs directory. Your AFS partition is in /afs/umbc.edu/u/s/username where username is your username, u is the first letter of your username and s is the second letter. Your /afs/umbc.edu/u/s/username/pub directory is where you store files visible to other users, such as your umbc webpage in pub/www/. Your /afs/umbc.edu/u/s/username/home directory is your AFS home directory.

Long-term data storage is not part of HPC and that is the purpose of the UMBC Research Data Storage. The advantage of this solution for long-term storage is that the data is mirrored and can survive a failure of entire file servers, or an entire server room, without loss and that moreover it can be mounted to other machines on campus including HPC. Thus, the data would be available on the head node of HPC, from which it can be copied to the IB storage during computations.

Document generated by Confluence on Mar 31, 2011 15:37